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OPTIMAL  DESIGN  OF  EXPERIMENTS 


Herman  Chernoff  — ^ 

Stanford  University 

1.  INTRODUCTION.  I would  like  to  discuss  some  aspects  of  the 
theory  of  optimal  design  of  experiments  with  particular  emphasis  on  its 
relevance  to  the  practice  of  statistics.  There  are  two  major  branches 
of  classical  statistics,  Estimation  and  Testing  of  Hypotheses,  for  which 
the  theory  of  optimal  design  yields  different  results.  Because  of  the 
time  limitation;  I shall  confine  my  attention  to  certain  results  and  ex- 
amples in  the  theory  of  estimation. 

2.  SOME  EXAMPLES.  To  illustrate  the  theory  let  us  consider 
three  examples.  The  .first  example  is  a well  known  one  with  a trivial 
solution.  That  is  the  one  of  estimating  the  slope  of  a regression  (straight 
line).  More  specifically  we  have 

Example  1 . 

The  experimenter  may  choose  any  number  y between  -1  and+1. 

This  number  y designates  an  elementary  experiment  which  corresponds 
to  observing 

Z=06+^y  + u 

where  u is  normally  distributed  with  mean  Q and  variance  1 and  OL 
and  j3  are  unknown  parameters.  The  experimenter  is  permitted  to 
select  a design  consisting  of  n values  y^,  y2>  ...»  yn.  with  pos- 
sible repetitions.  The  design  corresponds  to  performing  the  n designated 
experiments  independently.  It  is  desired  to  select  a design  which  will 
yield  the  be6t  possible  estimate  of  the  slope  ft. 

It.  is  well  known  and  it  is  intuitively  obvious  that  the  best  design  con- 
sists of  selecting  y = -1  and  y = +1  each  half  the  time  (providing  n is 

even). 


— This  work  was  supported  in  part  by  Office  of  Naval  Research  Contract 
Nonr-225(52)  at  Stanford  University.  Reproduction  in  whole  or  in  part 
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Another  example  which  is  of  some  current  interest,  having  been  dis- 
cussed in  yesterday's  paper  by  Mr.  Langlie  p on  a problem  in  reliability, 
and  which  is  also  relevant  to  the  problem  of  Probit  Analysis,  may  be  ex- 


and  which  is  also  relevant  to  the  problem 
pressed  as  follows: 


Example  2. 

A device,  which  may  be  used  only  once,  can  operate  successfully 
under  a stress  s with  probability 


s-M  Vztt 

c 


e-t  ,Z  dt  . 


In  other  words  one  may  say  that  the  strength  of  the  device,  as  measured 
by  the  maximum  stress  under  which  it  will  operate  successfully,  is 
normally  distributed  with  unknown  mean  and  variance  <J  , It  is 
desired  to  select  a design  consisting  of  the  choice  of  stress  levels 
8 , Bj,  . . . , s which  will  yield  an  optimal  estimate  of  M - kor  . The 

elementary  experiment,  designated  s,  consists  of  course  of  observing 
the  success  or  failure  of  the  device  when  used  under  stress  s. 


Finally  a third  problem  which  was  discussed  in  detail  in  a recent 
paper  of  mine  |2 1 deals  with  accelerated  life  testing.  Here  we  wish  to 
estimate  the  mean  life  time  of  a device  when  used  under  an  environment 
of  ordinary  stress  conditions.  If,  this  mean  lifetime  is  great  and  it  is 
desired  to  have  the  estimate  soon,  then  it  is  necessary  to  accelerate. 

The  device  is  subjected  to  a much  larger  than  ordinary  stress.  The 
results  of  such  accelerated  life  testing  can  be  relevant  only  if  one  assumes 
some  form  of  relationship  connecting  the  mean  lifetime  under  various 
stresses.  As  an  approximation  we  shall  assume  a quadratic  relationship 
for  some  limited  range.  In  addition  since  time  is  of  the  essence  we 
shall  assume  that  the  cost  pf  observing  a device  under  stress  s is  pro- 
portional to  the  mean  lifetime  under  that  stress.  Let  us  be  more  specific. 
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Example  3. 

A device  under  stress  environment  s has  lifetime  T with  an  ex- 
ponential distribution  with  failure  rate  (reciprocal  of  mean)  given  by 


+ 0- 


for  0 < s <1  s* 


where  and  & are  unknown  parameters.  It  is  desired  to  estimate 

the  failure  rate  under  the  ordinary  stress  sQ.  This  is 


V 


= 0 S + 0.  s ■ . 

o 1 o 2 O 


An  elementary  experiment  designated  by  s consists  of  observing  the 
lifetime  T of  a device  subjected  to  the  environment  s.  The  cost  of  the 
experiment  s is 


C(s)  = c(  $jS  + $2s2  ) 


It  is  desired  to  select  a design  consisting  of  experiments  6j,  b^,  . . . ; 

0<  s-^s*,  so  as  to  obtain  an  optimal  estimate  of  <pQ  for  a specified 
total  cost. 

Each  of  these  examples  has  certain  elements  in  common.  Each  may 
be  regarded  as  a special  case  of  the  following  general  formulation.  There 
is  a set  $ of  available  elementary  experiments  e.  In  each  case  the 
distribution  of  the  data  of  an  experiment  depends  on  the  experiment  and 
on  k unknown  parameters  represented  by  0 = ( 0^,  • • • » 0^). 

We  wish  to  estimate  some  function  g(  @2’  ’ ' * ^k^  °f  the  para- 

meters. A design  consists  of  the  independent  performance  of  experiments 
e^,  e 2>.  . . with  possible  repetitions.  It  iB  desired  to  find  a design  which 

yields  the  best  possible  estimate  of  g(  G^ , G 2>  . . . , G for  a speci- 
fied total  cost  or  for  a specified  number  of  observations. 
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3.  THE  LINEAR  REGRESSION  MODEL.  In  1952,  Elfving  fcj  derived 
an  elegant  geometric  solution  to  the  optimal  design  problem  for  a special 
but  important  case  of  the  above  general  formulation.  As  we  shall  see 
this  result  is  applicable  to  a large  variety  of  problems.  Let  H be  a set 
of  experiments  e denoted  by  (y^,  y^).  The  experiment  e consists  of 

observing 


Z = 0 y,  + 


6zh  + u 


where  u is  normally  distributed  with  mean  0 and  variance  1.  It  is 
desired  to  obtain  an  optimal  estimate  of  a^  6-^  + a^  0^  u®ing  a design 

consisting  of  n observations.  The  first  example  of  estimating  the  slope 
of  a straight  line  is  a special  case  of  Elfving' s linear  regression  model 
where  £ is  the  set  of  points  (1,  y)  with  -l^.y^.1,  and  (a,,  a ) = (0,1). 

Elfving' s solution  consists  of  constructing  a set  S which  is  the 
smallest  convex  set  containing  the  points  (y  , y^)  of  ji  and  their  nega- 
tives ( ~y i * -y2).  Then  extend  the  vector  from  (0,0)  to  (aj.a^)  until  it 
penetrates  the  set  S.  The  point  of  penetration  (wj,w2)  represents  the 
optimal  design.  If  this  point  is  one  of  the  original  points  (y^,  y2)  or 
( “Yj » -y2)  die  optimal  design  consists  of  repeating  (y^,  y^)  n times. 

Otherwise  the  point  of  penetration  is  on  a line  segment  connecting  points 
corresponding  to  two  of  the  original  experiments  (or  their  negatives). 
Then  the  optimal  design  consists  of  repeating  these  two  experiments  in 
proportions  given  by  the  distances  from  (w^,  w^)  to  the  two  points.  The 
greater  proportion  corresponds  to  the  experiment  closer  to  (w^,  w^). 
Finally  the  variance  of  the  least  squares  estimate  based  on  this  design  is 

rr'2  r , 2 2.n  _i,  2 2.  2,  2 2/  2 

Oq  = [n(  w j + w2)J  (aA  + a2)  = aj/nwj  = a2/nw2 


This  solution  can  be  illustrated  with  example  1.  Here  S is  the 
square  whose  corners  are  (1,  1)  and  (-1,  -1)  corresponding  to  y * 1 and 
(1,  -1)  and  (-1,  1)  corresponding  to  y = -1.  The  line  from  (0,0)  through 
(a^,a2)  = (0,  1)  penetrates  S at  (0,  1)  which  is  halfway  between  (1,  1) 
and  (-1, 1).  Thus  the  optimal  design  consists  of  repeating  the  experi- 
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ments  corresponding  to  y = 1 and  y = -1  each  half  the  time  (as  was 
well  known).  Furthermore  the  variance  of  the  estimate  of  should  be 
1/n. 

Elfving's  result  applies  in  the  obvious  fashion  to  experiments  in- 
volving k parameters.  Here  we  need  repeat  at  most  k of  the  avail- 
able experiments  in  certain  proportions  to  obtain  the  optimal  estimate. 

4,  RESULTS  FOR  THE  MORE  GENERAL  PROBLEM.  As  mentioned 
in  the  preceeding  section  the  problem  treated  by  Elfving  is  a special 
case  of  the  more  general  one  formulated  in  section  2.  For  this  more 
general  problem,  related  results  have  been  obtained  [l3  . These  resultB 
concern  designs  which  are  asymptotically  locally  optimal.  We  shall 
defer  the  interpretation  of  these  adjectives  until  the  discussion  of 
Example  2 in  section  5. 

It  was  shown  that -asymptotically  locally  optimal  designs  depend  on 
the  form  of  the  matrix  J(e)  which  is  defined  as  Fisher's  information 
matrix  divided  by  the  cost  of  the  experiment  e.  In  other  words  if 
experiment  e has  cost  C(e)  and  yields  data  X with  probability  dis- 
tribution f(x,  0 , e),  Fisher's  information  matrix  is 

K*>  . ||  e{  3 l0g  f(Xa’/  ' e)  ~M°*  • «)  }|| 


and  the  information  per  unit  cost  is 


J(e)  = I(e)/C(e). 


Clearly  if  the  cost  of  experimentation  is  constant  one  need  concern  one- 
self only  with  1(e).  The  relevance  of  Fisher's  Information  derives 
from  its  well  known  additive  properties  and  the  fact  that  the  maximum - 
likelihood  estimate  0 n<  based  on  the  outcome  of  n independent  repe- 
titions of  e,  has  an  approximately  normal  distribution  with  mean  Q and 
covariance  matrix  [nl(e)j  for  large  n. 


308  Design  of  Experiments 

When  it  is  desired  to  estimate  one  function  of  the  k parameters, 
there  are  asymptotically  locally  optimal  designs  which  involve  at  most 
k of  the  experiments  of  £ in  certain  proportions.  This  result  which 
corresponds  to  one  of  Elfving's  results,  together  with  the  use  of  Fisher's 
Information,  permits  one  to  reduce  the  calculation  of  optimal  designs  to 
the  maximization  of  a function  of  a fixed  number  of  variables. 


In  the  linear  regression  problem  of  Elfving,  the  information  matrix 
for  e = (yj , y ^ is 


i * 1 y^ll  * J- 


Since  asymptotically  optimal  designs  are  determined  by  the  information 
per  unit  cost  it  follows  that  for  any  problem  where  J(e)  can  be  put  in 
the  above  form,  the  solution  is  the  same  as  Elfving's  with  a.  replaced 
by  ii  . 


The  illustration  of  the  next  section  will  help  clarify  the  meaning  of 
these  results.  In  the  meantime  it  may  be  remarked  that  if  for  each 
experiment  the  distribution  of  the  outcome  depends  on  only  one  function 
of  the  parameters,  J(e)  can  be  put  in  the  above  form  and  Elfving's 
results  are  applicable.  In  particular  they  are  applicable  to  both  examples 
2 and  3. 

5.  ILLUSTRATION.  We  shall  find  it  informative  to  illustrate  the 
method  with  example  2.  Here  the  outcome  of  the  experiment  s is 
success  or  failure  where  the  probability  of  success  is 


p(s,  fJ- , cr)  = 


,V5f' 


•'  / 2 dt,  i 


where  (j)  is  the  normal  cdf.  In  other  words  the  role  of  the  density 
f(X,  6 , e)  is  played  by 


Design  of  Experiments 


309 


, Xn  .1-X 
f = P (1  - p) 


where  X = 1 for  success  and  0 for  failure. 


log  f = X log  p + (1-X)  log(l  - p) 


8.  log  f _ X - p 9p 
d ft  ~ pU  - p)  dfl 


8 log  f = X - p 9p 
8 o’  p(l  - p)  8cr 


Since  E£(X-p)  3SP(1-P)  » 


J(s)  = I(s)  = j p(l  - p)] 


-1 


J(s)  = 


YiY 


l&L.)  9£®E 

'dM  ' 8/i  dr 

2 

8£  8p  /8p. 

8 or  *3? 


where 


Yjts)  = [ p(l-p)]  ^2|^  = [2  tt  p(l-p)j  X^2(r  1 exp  [-(s-p  )2/2y  2] 

y2(s)  = [p(l-p)j  0fT  = [2  it  pU-p)"1^2  (s-  ft  ) cr  2 exp  [-(s-p.  )2/2pr  2] 


Next  we  plot  the  set  of  points  ^y^s),  y2{s)j  Figure  1.  We  add  the 

negatives  of  these  points  and  construct  S the  smallest  convex  set  contain- 
ing them.  We  note  that  for  s = P + t cr  , y2 (sj/y^s)  = t.  We  also  note 

the  curve  of  ^yj(s),  y^(s)j  reaches  its  maximum  and  minimum  at 
s = p + k0C'  where  kQ  = 1*  57..  Finally,  since  we  wish  to  estimate  H - k<^ 


310  Design  of  Experiments 

we  draw  the  vector  from  (0,  0)  through  (1,  -k),  i.  e.  the  line  through  the 
origin  with  slope  -k,  and  note  where  it  penetrates  the  convex  set  S. 

Clearly  there  are  two  cases. 

Case  1.  |kj<ko.  Here  the  vector  penetrates  S at  one  of  the  original 
[y^s),  y-,(s)j  points.  In  fact  this  point  corresponds  to  s * p--k©/  and 
hence  the  optimal  design  consists  of  using  8 = p.  -k  o'  for  all  obser~ 
vations. 

Case  2.  Jk|^kQ.  Here  the  vector  penetrates  S at  the  straight  line 

section  of  the  boundary.  The  optimal  design  consists  of  applying  the 
stress  levels  f-t  - k^  and  JJ.  +k0^  in  proportions  k+kQ  to  k-kQ. 

In  cases  1 and  2 the  formal  application  of  the  formula  for  the  variance 
of  the  maximum  likelihood  estimate  of  p.  -k0'  based  on  the  optimal 
design  is  given  by 

2 

zTfar  2$(k)  [i-$>  (k)]  ek  n-1 


in  case  1 , and 


Ko 

ZTf^ZQ(ko)  [l-$(ko)]  e V kZn'1  =1.64i/2kV1 


in  case  2. 

6.  THE  RELEVANCE  OF  OPTIMAL  DESIGN.  Now  we  shall  find 
the  illustrative  example  helpful  in  interpreting  the  results  of  the  theory 
of  optimal  design  of  experiments  and  in  understanding  its  relevance  in 
practical  applications.  For  simplicity  let  .us  confine  our  attention  to 
case  2 at  first. 

First  we  note  one  very  peculiar  aspect  of  the  optimal  design.  Since 
it  involves  using  stress  levels  P»  -k^  and  fA.  +lc^or  , to  apply  it  one 

must  know  P-  and  O'  . But  if  one  knew  and  O'  , there  would  be  no 
need  to  experiment.  While  this  seems  to  be  ridiculous,  a glance  at 
figure  1 indicates  that  if  one  used  an  approximation  to  p-  ik^  , one 
would  have  a rather  good  approximation  to  the  optimal  design.  Thus 
there  is  surprisingly  little  loss  of  efficiency  when  one  is  not  certain 
about P and  O . It  is  this  property  that  the  word  local  is  used  to  describe. 

In  other  words  our  design  would  be  efficient  if  we  knew  the  parameters  and 
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V 


Figure  1 
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is  approximately  efficient  if  we  use  an  approximation  to  the  unknown 
parameters. 

This  raises  the  issue  of  the  adjective  asymptotic.  If  one  had  a large 
sample  available,  one  could  use  some  of  the  initial  observations  to  derive 
an  initial  estimate  of  and  on  which  to  base  an  approximation  to  the 
optimal  design.  Furthermore  the  qualification  asymptotic  derives  from 
a couple  of  other  aspects.  First,  the  properties  relating  the  variance 
of  the  approximate  distribution  of  the  maximum  likelihood  to  the  infor- 
mation matrix  and  giving  the  efficiency  of  this  estimate  is  based  on 
asymptotic  theory  assuming  large  sample  size.  A second  and  relatively 
minor  point,  is  illustrated  by  example  1 if  an  odd  number  of  observations 
are  available.  The  optimal  design  calls  for  putting  half  the  observations 
at  +1  and  half  at  -1.  This  is  impossible  in  a trivial  way  when  n is  odd. 
On  the  other  hand  the  effect  of  this  impossibility  is  negligible  when  n is 
large. 

Having  seen  how  we  must  qualify  the  tern  optimal  by  the  adjectives 
local  and  asymptotic,  we  can  now  consider  a more  fundamental  issue. 
Briefly,  our  optimal  design  is  simply  impractical.  Only  in  the  rather 
unrealistic  context  where  I had  absolute  faith  in  the  model  would  I con- 
sider this  as  a solution.  In  fact,  any  reasonable  statistician  would  insist 
on  using  several  other  stress  levels  at  least  to  check  on  the  model. 

Another  unreasonable  aspect  of  our  optimal  design  arises  from  its 
derivation  based  on  the  single  minded  purpose  of  obtaining  a good  esti- 
mate of  one  function  g (©•  0 2>  . . . , 0 of  the  parameters.  In  many 

practical  problems,  experimentation  is  used  to  serve  several  purposes 
simultaneously. 

One  may  reasonably  inquire  about  what  function  does  the  theory  of 
optimal  design  serve,  if  (1)  the  optimality  must  be  qualified  as  locally 
asymptotically  optimal  and  (2)  the  designs  it  yields  are  unreasonable. 
Basically  the  function's  are  the  following.  First,  the  theory  provides  a 
yardstick  for  comparison  purposes.  If  the  designs  proposed  yesterday 
by  Mr.  JLanglie,  or  the  Up  and  Down  Method  3,  p.  319  , or  some  other 
practical  design  turns  out  to  be  relatively  efficient  comijared  to  our  solu- 
tion (as  measured  by  asymptotic  variance)  then  clearly  there  is  no  point 
in  attempting  to  improve  on  this  aspect  of  these  methods.  If,  on  the  other 
hand,  one  of  these  methods  were  to  have  a low  efficiency,  then  one  is 
forced  to  delve  deeper  to  see  what,  if  anything,  can  be  done  to  improve 
the  design. 
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Second,  theory  not  only  presents  an  optimal  design  but  indicates 
rather  clearly  how  this  design  can  be  modified  with  relatively  low  loss 
of  efficiency.  The  theory  serves  to  direct  the  attention  of  the  practical 
statistician  toward  designs  which  combine  relatively  high  efficiency  with 
practical  utility  when  robustness  and  multi-purpose  considerations  are 
taken  into  account. 

7.  MISCELLANEOUS  COMMENTS.  I would  like  to  conclude  this 
paper  with  a few  assorted  comments.  First,  the  proposed  solution  to 
example  2 in  case  1 when  1 ^ kQ  consists  of  repeating  one  experiment 
n times.  Not  only  is  this  solution  impractical,  but  from  a theoretical 
point  of  view  it  represents  a degenerate  situation.  When  a single  level 
s is  used,  one  can  use  the  data  to  estimate  only 


f°°  2/ 

p(b,k*ct)  = J (2ir)-1f2e~t  'z 


8** 

or  functions  of  p(s,(jL  , o'  )•  Then  one  can  check  whether  r is  in 

O' 

fact  close  to  k (as  it  should  be  if  the  design  were  optimal).  But  not 
knowing  cf  , one  can  not  estimate  H-  -k o'.  Thus  the  formula  for  the 
asymptotic  variance  presented  at  the  end  of  section  5 is  meaningful 
only  as  an  approximation  to  the  case  where  several  levels  of  stress 
close  to  the  optimal  one  were  used.  Alternatively  one  could  regard 
pfl-pjn"1  as  the  asymptotic  variance  of  the  estimate  of  p. 


For  a large  sample  sequential  procedure,  it  seems  clear  that  our 
theory  is  applicable.  If  one  were  to  reestimate  the  parameters  after 
each  observation,  and  use  these  estimates  to  derive  approximations  to 
the  optimal  design,  the  resulting  procedure  should  be  asymptotically 
optimal  in  the  sequential  version,  and  the  adjective  local  need  not  be 
applied. 

What  is  more  interesting,  perhaps,  is  the  study  of  the  "not  so  large" 
sample  sequential  case.  Here  even  the  following  seemingly  simple 
problem  proposed  by  Harold  Gumbel  does  not  have  a simple  solution. 
Suppose  that  experiment  e^  yields  observation  X.  which  is  normally 

1 2 

distributed  with  unknown  mean  and  unknown  variance  0^  > i=l  > 2,  and 

it  is  desired  to  estimate  . In  other  words,  two  measuring  instruments 
of  unknown  accuracy  are  available.  How  should  one  select  between  the 


* 
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two  experiments  sequentially  so  as  to  obtain  a good  estimate  efficiently 
when  the  sample  size  is  not  necessarily  very  large? 
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